Skip to content

[pull] master from DataDog:master#590

Merged
pull[bot] merged 5 commits into
ConnectionMaster:masterfrom
DataDog:master
Jun 9, 2026
Merged

[pull] master from DataDog:master#590
pull[bot] merged 5 commits into
ConnectionMaster:masterfrom
DataDog:master

Conversation

@pull

@pull pull Bot commented Jun 9, 2026

Copy link
Copy Markdown

See Commits and Changes for more details.


Created by pull[bot] (v2.0.0-alpha.4)

Can you help keep this open source service alive? 💖 Please sponsor : )

ddog-nasirthomas and others added 5 commits June 9, 2026 15:36
…sts metrics (#23957)

* add mapping for updated metric name litellm_remaining_tokens_metric

* changelog

* add sister metric litellm_remaining_requests_metric

* changelog update
…ent events (#23778)

* kafka_consumer: collect Schema Registry compatibility, drop legacy agent events

Add global compatibility (GET /config) and per-subject effective
compatibility (GET /config/{subject}?defaultToGlobal=true) to the
data-streams-message payload for schemas. Per-subject results are
cached alongside version metadata; a compatibility change is included
in the cache key so it triggers re-emission like a schema change.

Stop emitting agent-side events for broker, topic, and schema
configurations — the same payloads continue to flow to the
data-streams intake, which is the canonical consumer.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Add changelog entries for #23778

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Refresh per-subject schema compatibility on the configs cadence

Previously per-subject compatibility was only fetched alongside a Tier 2
schema fetch — so a customer flipping a subject's compatibility without
publishing a new version was invisible until the next version bump.

Move compatibility onto its own TTL-driven cache, reusing the same
broker/topic configs refresh interval (default 3 minutes, with the same
jitter). The per-run batch size mirrors the version-check tier (200).
Subjects with a new version this run are still bundled into the same
parallel wave, so we avoid a separate HTTP burst when the two coincide.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Fix two_tier_fetch test to account for compatibility in cache entry

The new compatibility-refresh logic populates the compatibility field on
both the changed subject (via the full-fetch wave) and the unchanged
subject (via the standalone compat-refresh path). Add the mocks and
update the cache assertions to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Include global_compatibility in the schema emission cache key

A global compatibility change with no per-subject change would otherwise
leave subjects with explicit overrides emitting stale global_compatibility
in their payload until the TTL eviction kicked in.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

* Address review feedback: TypedDict, docstrings, test isolation, compat-flip test

- Add `compatibility: NotRequired[str | None]` to SubjectVersionInfo so the
  TypedDict reflects the actual cache shape after this PR
- Collapse multi-line docstring on private _get_schema_registry_subject_compatibility
  to one-liner per project style guide
- Trim 3-line comment block to one line
- Fix `if subject_compat:` → `if subject_compat is not None:` to avoid
  silently dropping an empty-string API response
- Remove dead test variables (expected_broker_config/topic_config/schema)
- Add compatibility mocks to three existing tests that relied on exception
  handlers swallowing live HTTP calls
- Add test_schema_registry_compatibility_flip_triggers_reemission covering the
  headline behaviour: a compat change without a version bump triggers re-emission
- Consolidate changelog into a single 23778.changed entry

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix NameError: restore expected_* dicts used by DS assertions in test_collect_cluster_metadata

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Fix global_compatibility is-None guard and add missing docstring

- `if global_compatibility is not None:` to match the subject_compat fix and
  avoid silently dropping an empty-string API response
- Add one-liner docstring to _get_schema_registry_global_compatibility for
  consistency with its sibling method

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Wrap over-length list comprehension in compatibility-flip test

The schema-event filter exceeded the 120-char line limit. Split the
data-streams payload extraction and the config_type filter into two
readable comprehensions.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Address all AAraKKe suggestions: helper, log level, compat fallbacks, batch clamp, test cleanup

- Extract _schema_registry_get helper; all five GET helpers become one-liners
- Raise per-subject compat fetch log level from debug to warning
- Persist global_compatibility on success; restore last known value on failure
- Clamp compat_due to max(0, BATCH_SIZE - len(subjects_needing_full_fetch)) to keep total bounded
- Standalone loop skips subject in schema_responses (not subjects_needing_full_fetch) so a failed schema fetch no longer silently discards the compat value
- Fall back to cached compatibility when version-bump path gets None from a failed fetch
- Parameterise mock_schema_registry_methods with global_compat/subject_compat; remove call-site overrides
- Move expected_* dicts adjacent to their consumers in test_collect_cluster_metadata
- Add three failure-path tests: subject compat failure preserves cached value, global compat failure uses last known, None compat omits field from payload

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Address review: global-compat None fallback, parallel-fetch helper, emit-loop extraction, test dedup

- Fall back to cached global compatibility on a None /config response, not just on error
- Extract _parallel_fetch helper to collapse the triplicated ThreadPoolExecutor blocks
- Extract _emit_schema_registry_events from _collect_schema_registry_info
- Document the TTL-reset-on-failure, defaultToGlobal None semantics, and post-upgrade re-emission
- Add schema_ds_events and mock_compatibility_methods test helpers; cover the global-compat-only flip

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Address round-2 review: cache-key ordering comment, _wire_cache test helper, relocate _make_schema_registry_check

- Document the ordering invariant between the two latest_version_cache write sites
- Note the per-subject compatibility-flip cadence latency in the inline comment
- Add a _wire_cache helper and adopt it across the schema-registry tests, removing the
  duplicated cache_storage/mock_read/mock_write scaffolding
- Move _make_schema_registry_check up near the other shared helpers and migrate the
  inline instance/client wiring in the older schema-registry tests to use it

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Address round-3 review: type-annotate _schema_registry_get, extract compat/schema-info helpers, test compat-skip

- Annotate _schema_registry_get as (**kwargs: Any) -> Any
- Extract _collect_subject_compatibilities from _collect_schema_registry_info, documenting that
  the batch-size clamp is not a hard ceiling on /config/{subject} calls
- Extract _build_schema_info to share the canonical payload dict between the fetched and
  cache-reconstruction paths
- Add a test asserting compatibility is not refetched when its cadence cache is fresh

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Address round-4 review: lift compat cache mutation to call site, add SchemaInfo TypedDict, parametrize flip tests

- Remove the hidden side effect from _collect_subject_compatibilities: it now only fetches and
  returns, with the standalone-flip cache mutation applied explicitly at the call site
- Add a SchemaInfo TypedDict and annotate _build_schema_info's return with it
- Merge the two near-identical compatibility-flip tests into one parametrized test (subject/global)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* Fix ruff formatting after rebase conflict resolution

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Rename changelog entry from changed to added for PR #23778

Not a breaking change — existing events are deduplicated, not removed.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…#23902)

* Emit connection_error DSM event when Kafka connection fails

When cluster monitoring is enabled and request_metadata_update() fails,
emit a data-streams-message with config_type=connection_error and a
reason field carrying the exception message, so consumers of the
heartbeat pipeline can distinguish unreachable clusters from healthy ones.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add changelog entry for PR #23902

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Address review comments on connection_error heartbeat

- Guard emission call in try/except so errors from event_platform_event
  never mask the AdminClient connection error message
- Extract _emit_cluster_monitoring_event helper to share collection_timestamp,
  bootstrap_servers, and event_platform_event call across both senders
- Add match= to pytest.raises in all three new tests
- Extract _connection_error_events helper to deduplicate test extraction logic
- Move import json to stdlib block (isort)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Apply ruff format

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

* Add regression test: sink failure must not mask AdminClient error

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

---------

Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com>
* update vSphere dashboard

* remove tag

* filter correctly
@pull pull Bot locked and limited conversation to collaborators Jun 9, 2026
@pull pull Bot added the ⤵️ pull label Jun 9, 2026
@pull pull Bot merged commit b1d8267 into ConnectionMaster:master Jun 9, 2026
1 check passed
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants